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Partial directed coherence (PDC) and directed coherence (DC) which describe 
complementary aspects of the directed information flow between pairs of univariate 
components that belong to a vector of simultaneously observed time series have recently 
been generalized as bPDC/bDC, respectively, to portray the relationship between subsets 
of component vectors (Takahashi, 2009; Faes and Nollo, 2013). This generalization is 
specially important for neuroscience applications as one often wishes to address the 
link between the set of time series from an observed ROI (region of interest) with 
respect to series from some other physiologically relevant ROI. bPDC/bDC are limited, 
however, in that several time series within a given subset may be irrelevant or may even 
interact opposingly with respect to one another leading to interpretation difficulties. To 
address this, we propose an alternative measure, termed cPDC/cDC, employing canonical 
decomposition to reveal the main frequency domain modes of interaction between 
the vector subsets. We also show bPDC/bDC and cPDC/cDC are related and possess 
mutual information rate interpretations. Numerical examples and a real data set illustrate 
the concepts. The present contribution provides what is seemingly the first canonical 
decomposition of information flow in the frequency domain. 
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1. INTRODUCTION 

Human behavior is primarily thought as a property that emerges 
from the interaction of several brain areas, body parts, and 
the environment. Understanding how these elements dynami- 
cally interact is one of major themes of systems neuroscience. 
Several multivariate time series methods — old and new — have 
been introduced to describe the interdependence between brain 
areas using signal modalities like EEG, BOLD signals, MEG and 
LFP — and are collectively called connectivity measures. Partial 
directed coherence (PDC) (Baccala and Sameshima, 2001) and 
directed coherence/ directed transfer function (DC/DTF) (Kaminski 
and Blinowska, 1991) are two examples of such connectivity mea- 
sures. Both describe complementary aspects (see Baccala and 
Sameshima, 2014 for an in depth discussion) of how informa- 
tion flows between pairs of univariate time series components 
that belong to a multivariate vector of simultaneously observed 
time series (Takahashi et al., 2010). Recently, PDC and DC have 
been generalized (as bPDC/bDC, respectively) to describe how 
subsets (blocks) of components within a time series vector inter- 
relate (Takahashi, 2009; Faes and Nollo, 2013). This is specially 
important for neuroscience applications as one often wants to 
investigate the interaction between sets of time series that are cir- 
cumscribed to an observed region of interest (ROI) with respect 
to another physiologically relevant ROI (Nedungadi et al, 2011). 
The potential relevance of this type of question alone justi- 
fies looking for their deeper meaning in terms of information 
theoretical quantities. 



Despite their practical importance, bPDC/bDC suffer from the 
limitation that several time series within a given subset may be 
irrelevant or interact in opposition to one another thereby posing 
interpretation difficulties. Also, in several situations, a researcher 
may be interested in just the few "best" descriptions of interac- 
tion between two sets of time series but not in the total amount 
of information flowing between them. For a more concrete exam- 
ple, assume that two brain areas interact and that bPDC is large. 
In this situation, it does not straightforwardly follow that all 
brain region components are interacting in the same way, or even 
whether some such components may be ignored. One way to 
address this limitation is to decompose bPDC/bDC into different 
components weighed according to relevance. 

The aim of this article is twofold: (a) to provide a proper infor- 
mation theoretic interpretation for bPDC/bDC and (b) to intro- 
duce a canonical decomposition of information flows, henceforth 
termed, respectively, canonical PDC/DC (cPDC/cDC). These new 
decompositions allow us to closely mimic classical canonical cor- 
relation analysis so that different dynamically relevant interaction 
modes between brain areas can be exposed. Due to PDC inter- 
pretability in terms of Granger causality (Baccala and Sameshima, 
2014), a consequence of the present formulation is that cPDC 
represents a long sought frequency domain counterpart to time 
domain canonical decompositions of Granger causality (Sato 
et al, 2010; Ashrafulla et al, 2013). 

The article is organized as follows. We first introduce the 
background and notation necessary for the rest of the article 
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(section 2). In the results section (section 3), we first show that 
both bPDC and bDC between two subsets of processes are block 
coherences between suitably defined underlying processes. Then, 
we demonstrate that such coherences are nothing but mono- 
tonic transformations of the mutual information rate between 
the respective processes (Gelfand and Yaglom, 1959; Takahashi 
et al, 2010; Nedungadi et al, 2011) leading immediately to their 
interpretability as mutual information rates. Next, we introduce 
cPDC and cDC and prove that they are the non-zero eigenvalues 
of the matrices whose determinants underlie the respective bPDC 
and bDC definitions (section 4). Using simulated examples and 
publicly available data we illustrate the usefulness of cPDC/cDC 
(section 5) followed by a brief discussion (section 6). Proof details 
are left to the Appendix. 

2. BACKGROUND 

Let X\ , . . . , Xk be K distinct multivariate time series vectors with 
dimension Mi , . . . , Mk. Using T to indicate matrix transposition, 
letX(f) = [Xi(t) T , . . . ,X K (t) T ] for each time t e Z be a second 
order stationary time series with spectral density matrix S(co) at 
each frequency co e [— 7t, 7t). To justify our formal computation, 
we assume that S(co) is uniformly bounded from below and above 
and invertible at all frequencies (Hannan, 1970). This is called 
the boundedness condition which guarantees that the following 
autoregressive (AR) representation of X holds in the mean square 
sense 



x(t) = J2 A d)x(t- /) + 6(0, 



(1) 



1=1 



where e(t) = [<£\(t) T . . . €K(t) T ] stands for a zero mean inno- 
vation process, i.e., E[e(t)e(t) T ] = E and E[e(t)e(l) T ] = 0 for 
/ # t. For / > 1, A(Z) are (Mi + . . . + M K ) 2 -dimensional matri- 
ces. Let Ap q (l) for p, q e {1, . . . , K} and / > 1 be M p x M q - 
dimensional matrices so that A(l) has the following structure 



A(l) 



■And) ... A 1M (l)' 



_Ami(0 • • • Amm(0. 



-leol 



We define A(oo) = I - J^i > l A d)e~ 

Under the boundedness condition, the following moving aver- 
age (MA) mean square sense representation for the process X also 
holds 



(2) 



1 = 0 



where H(l) for / > 0 are (Mi + . . . + Mk) 2 -dimensional matri- 
ces. Let H{co) = J2i> 0 H(l)e-^ Ic ° l . We have that A*(co) = 
H~ l (co) for all co e [—n,n). The superscript * indicates the 
matrix complex conjugate. 

Let P(co) = S -1 (&0- bPDC from the multivariate process Xj 

to the process X{ at frequency co, denoted n^\co), is defined 



(Takahashi, 2009; Faes and Nollo, 2013) by 

ttJ.%) = 1 - det(p» - AJME^A^O))) det(PyM)- 1 , 

(3) 

where det indicates the determinant and the subscript indices 
relate to the natural block structure associated with the matrices. 
Let 0 = bDC from the multivariate process Xj to the 

process Xj at frequency co> denoted (co), is defined (Takahashi, 
2009; Faes and Nollo, 2013) by 



(*>) = !- det(s«(a>) -HyM©: 1 ^^)) det (Su(oo)y 1 . 



(4) 



Note that the present bDC definition differs slightly from the one 
in Faes and Nollo (2013). We removed the unnecessary condition 
of strict causality, i.e., diagonality of E, simply by substituting 
EJ 1 by ®J- 1 in their definition of bDC as it is more suited for 
formulating information theoretic results as shown ahead. 

Consider a second- order stationary multivariate process 

W(t) = [Y(t) T Z(t) T ] T . The block coherence between Y and Z 
at frequency co is defined as (Nedungadi et al., 201 1) 

cf z (oo) = 1 - det (Sww(oj)) det (Syy(oj))- 1 det (S zz (co)y\ (5) 

Observe that we used the process name in the subscript of the 
power spectrum S to indicate the corresponding spectral density 
matrices. In the rest of the article, we will use interchangeably 
the process name or the corresponding indices in the subscript 
whenever there is no ambiguity. 

Another important definition is that of mutual information 
rate (MIR) between two multivariate strictly stationary processes 
Y and Z is 



MIRy 



: lim -E 



log 



dP(Y(l), 



Y(f),Z(l),...,Z(t)) 



d¥(Y(l), . . . , Y(t))dF(Z(l), . . . , Z{t)) 



(6) 



The classical relationship between block coherence (Equation 5) 
and mutual information rate (Equation 6) follows from 

Theorem. (Gelfand and Yaglom, 1959; Pinsker, 1964) IfY and Z 
are jointly stationary Gaussian processes satisfying the boundedness 
condition, we have that the MIR between Y and Z is given by 



MIR YZ = -i- j" log (l - C^m) 



dco. 



(7) 



Now, following Takahashi et al. (2010), we define, for 
i 6 {!,..., K], the partialized process rji by 



m (t) = Ut) - E [XMl {Xj(l),j £i,le Z}] , 



(8) 



where E[0|0] henceforth denotes the best linear conditional 
predictor. Likewise the partialized innovation process & for i e 
{!,..., K} is 



Si(t)=€ i (t)-E[€ i (t)\{€ j (t),j£i}]. 



(9) 
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Observe that both partialized process and partialized innovation 
process were defined in Takahashi et al. (2010) but for the special 
case of scalar rjj and 

3. RELATION BETWEEN bPDC/bDC AND MUTUAL 
INFORMATION RATE 

Our first result establishes the relationship between bPDC and 
block coherence and is analogous to Theorem 1 in Takahashi et al. 
(2010). 

Theorem 1. Let X satisfy the boundedness condition. For all 
i,je{l,...,K] and all frequencies co e [—tt, tt) we have that 

tt^\co) = C^.(co). do) 

A straightforward corollary is 

Corollary 1. Let X be a stationary Gaussian process and satisfy the 
boundedness condition. For all i, j e { 1 , . . . , K} we have that 

MIR €iT} . = -i- p log (l -Trf\co))dco. (11) 

Similar results also hold for bDC. 

Theorem 2. Let X satisfy the boundedness condition. For all 
i, j e {1, . . . , K} and all frequencies co e [—tt, tt) we have that 

yjj b \co) = cM.(co). (12) 

and 

Corollary 2. Let X be a stationary Gaussian process and satisfy the 
boundedness condition. For all i, j e { 1 , . . . , K}, we have that 




4. CANONICAL PDC AND DC 

Canonical correlation is a classical method developed initially 
by Hotelling (1936) to address the relationship between ran- 
dom vectors. Brillinger (1981) generalized the method for time 
series and gave an excellent account of the relationship between 
canonical correlation analysis and different ideas in multivariate 
statistics. Our formulation of canonical coherence is equivalent to 
the definition introduced by Brillinger (1981). 

Let Y and Z be respectively My- and M%- dimensional jointly 
second order stationary processes. To better understand the rela- 
tionship between Y and Z, we can ask the following question: 
Which components of Y and Z are most representative of the 
interaction between the processes? One way to formalize this is to 
consider filtering matrices By (I) ( 1 x My) and Bz(l) ( 1 x Mz), 
for all / e Z and define the scalar processes by and bz by 

by(t) = J2 B Y(DY(t-l) (14) 

leZ 



and 

bz(t)= J2B z (l)Z(t-l), (15) 

leZ 

so that Cb Y b z (co) is maximized for all co e [—tt, tt). If further- 
more Y and Z are jointly stationary Gaussian processes, then this 
is equivalent to maximizing MIR^ 7 ^ Z . 

Following the above idea, we define the first canonical coher- 
ence between Y and Z at frequency co by 

Cyz(°>) = SU P Cb Y b z (co). (16) 

B Y ,B Z 

Assume that the supremum (Equation 16) is achieved for by 
and bz, which we call first canonical time series. Consider the 
residual processes Y l (t) = Y(t) - E[Y(t)\ [by (I), I e Z}] and 
Z\t) = Z(t) - E[Z(t)| {b z (J), I G Z}]. Observe that Y 1 and Z 1 
are uncorrected to the processes by and bz, respectively. The sec- 
ond canonical coherence Cy^{co) is defined recursively on the 

residues by Cy^{co) = Cyl^i (co)- 

Analogously, for 2 < m < min{My, M^}, considering the 
residual processes 

Y m (t) = Y m -\t)-E[Y m - l (t)\{b Y k(t),leZ, 
ke{l,...,m - 1}}] 

and 

Z m (t) = Z m -\t)-¥\Z m -\t)\ [b Z k(l), I G Z, 
ke{l,...,m - 1}}], 

one may define the ra-th canonical coherence as 

C i ^ ) (co) = C^J_ lzm _ 1 (co). (17) 

In this way, it is possible to construct a hierarchy of coherences 
where each element captures the dependence structure that is not 
explained by the other elements. 

Finally, we introduce cPDC and cDC. For m < min{M/, Mj}, 
the m-th canonical PDC from j to i at frequency co denoted 
Ttjj"^ (co) is defined by 

tt^\co) = C^(co). (18) 

Similarly, the m-th canonical DC from j to i at frequency co 
denoted y^ Cm ^ (co) is defined by 

Y j j Cm) (w) = C ( ^(w). (19) 

At first sight, it is unclear whether the canonical PDC and DC 
exist at all or even if they are uniquely defined. More importantly, 
nor is it obvious that it is possible to compute them. Despite these 
initial uncertainties, we show next that canonical coherences are 
consistently defined as the non-null eigenvalues of some specific 
matrices. 
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Let X m (Q) denote its ra-th eigenvalue from matrix Q ordered 
from its largest to its smallest value. The following theorem 
furnishes a practical way to calculate cPDC and cDC. 

Theorem 3. Under the boundedness condition for X the following 
identities hold: 



X' 



( 



^■(*>)S: 



■m) 



and 



(20) 



(21) 



Furthermore it is possible to relate bPDC/bDC and 
cPDC/cDC via 

Theorem 4. Under the same conditions of Theorem 3 the following 
identities hold: 

min {Mj,Mj} 

4%)=i- n (i-4 m)(w) ) (22) 



and 



.(b) 



min {Mi,Mj} 

M = l- f] (l-K^M). (23) 



m=l 



A simple consequence of Equations (22), (23) is that for station- 
ary Gaussian processes satisfying the boundedness condition, we 
now have a decomposition of the mutual information rates 



min{Mi,Mj 



MIR 6 



= e -r- r io§ o - 4 m)(w) ) ^ (24) 

m = l 471 ^ 



and 



min [Mj,Mj 



MIR «;= X) -5-/ logll-y^^W (25) 

m=l 471 

Note how the quantities being summed in Equations (24), (25) 
are formally themselves contributions to the mutual information 
written in terms of their canonical coherence contributions. 

5. ILLUSTRATIONS 
5.1. SIMULATED MODELS 

Example 1. To provide insight into cPDC behavior, we begin with 
a very simple example that can be fully and explicitly solved. 

Let a vector of observed time series [Y\, Y2, Y3, Y4] be a real 
valued autoregressive process of order p = 1 and £ = I. The 
autoregressive coefficients of the model are described by 



A(l) 



/.5 / 0 0\ 
e .5 0 0 
a b .5 h 

\c d g .5/ 



(26) 




FIGURE 1 | Connectivity diagram for Example 1. The number of 
canonical components depends on the value of ad - be. 



By adopting time series blocks as X\ = [Y\ Y2] and X2 = 
[^3 Y4], when e=f = g = h = 0, direct computation shows that 
the canonical PDC from block X2 to X\ is zero, i.e., Tt^ico) = 
tt[ c 2 2 \cl>) = 0 for all co (reflecting the nullity of the 2 x 2 A(Z) right 
side upper block), whereas the coupling in the opposite direction 
contributes two distinct components: 



(ci)/ . a 2 + b 2 + c 2 + d 2 + yV + b 2 + c 2 + d 2 ) 2 - 4(ad - be) 2 ^ x 
niiico) = — ^— - (27) 



2.5 — 2 cos (<w) 



and 



as in Figure 1. 



{ci)f . fl 2 + fr 2 + c 2 + ^ 2 - 7(^2 + b 2 + c 2 + ^ 2 ) 2 - 4(ad - be) 2 
2.5 - 2 cos (co) 

For ari = — i.e., if the lower left 2x2 block determinant of A(l) 
is zero as well, the total number of non-zero cPDC components 
reduces to just 1. 

Even if e, /, g, h are non-zero, i.e., regardless of intrablock 
dynamics, a = b = 0 suffices to produce the single non-zero 
7t2i\(o) component (shown in Figure 2A) since block Xi inter- 
acts with block X2 exclusively through Y4, i.e., n^ico) = 0. In 
this case, since only Y4 is directly impacted by the interaction, 
only one combined source of variance exists even though two 
links exist between the blocks. Likewise if b = d = 0, even though 
two links leave X\ , there is only one dynamical component that 
counts. 

This contrasts with the situation when b = c = 0 where two 
non-zero n^ico) coexist (Figure 2B) regardless of the values of 
e, f, g> h which, nonetheless, contribute to the relative size of the 
components. 

Example 2. In the next example, a 10-variate time series 
(Yi, . . . , Y10) follows the connectivity diagram represented in 
Figure 3. The multivariate time series is divided into four blocks 
(Xi, X2, X3, and X4), where X4 only sends information and X3, 
which is an integrative block, only receives information. Block 
Xi has two functionally distinct internal parts, and only one is 
reached by outside influence. The scenario is fairly complicated 
and we next illustrate cPDC/cDC usefulness for understanding 
the underlying dynamic interaction between blocks. 
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FIGURE 2 | Illustrative plots of the observations in Example 1. (A) 

CPDC21 results for e = f = g = h = 0 in Example 1 revealing just one 
non-zero component under the ad = be condition. (B) CPDC21 when 
b = c = 0 and non-zero a and c in Example 1 leading to two non-zero 
components for any non-zero values of the e, f, g, h coefficients (the graph 
shown was produced using a = 0.5, b = 0, c = 0, d = 1, e = 0.3, f = -0.1 , 
0 = 0.3, h = 0A). 



a 




FIGURE 3 | Connectivity diagram for Example 2 portraying how the 
bock sets. Note the effect of the value of the a parameter on cDC (Figure 5 
versus Figure 6). 



To help interpret the results, we begin by describing the non- 
zero model coefficients and their dynamical effects. Observe that 
the model subscript indices in this example indicate the corre- 
sponding scalar process and not the block number. 

1. Block Xi = [Yi Y 2 Y 3 Y A Y 5 ] 

A u (l) = 1.98 aw(7T/50), A u (2) = -(.99) 2 , 

(low frequency oscillator in Y\ ) (29) 
A 2 , 3 (D = 1, (30) 
A 3 , 3 (l) = L98cos(7T/2),A 3f 3(2) = -(.99) 2 , 

(oscillator at midband (tt/2) in Y3) (31) 



A 5i 4(l) = .99,A4 f 5(l) = -.99, 

(oscillator at midband in [Y4 Y5] ) (32) 

A 8>2 (1) = 1,A 8>2 (3) = 1, (midband notch) (33) 

A 6 , 3 (l) = l,A 6j3 (3) = 1, (midband notch) (34) 

A 9 ,i(l) = l,A 9l5 (l) = I- (35) 

2. Block X 2 = [Y 6 Y 7 ] 

A6 f 7(l) = .99,A 7l 6(l) = -.99, 

(oscillator identical to the [Y 4 Y 5 ]) (36) 
A 9f6 (l) = I- (37) 

3. Block X 3 = [Y 8 Y 9 ] 

A 8 , 8 (l) = -1, A 9f8 (l) = .5. (38) 

4. Block X 4 = U10] 

Aio,io(l) = 1-98 cos(2tt/3), A 10 ,io(2) = -(.99) 2 , 

(high frequency oscillator in Y\o ) (39) 

A 4 ,ioU) = a, (40) 

A 7 ,io(l) = I- (41) 



The resulting cPDC components can be appreciated in Figure 4 
for \a\ = 1. Among their interesting features is the existence of 
the notch filtered link from X\ to X 2 and to X3 at midband. The 
effects of the low frequency dynamics due to Y\ and the midband 
resonance due to [Y4 and Y$] manifests itself as the strongest com- 
ponent from X\ to X3 . Likewise the single link effect from X 2 to 
X3 is readily apparent as the higher frequency resonances from X4 
toward both X\ andX 2 . BothX3 components are identically equal 
to 1 since nothing leaves the block. 

The corresponding cDCs are portrayed in Figure 5 for a = — 1 
with no signal reachability from X4 to X3 . This contrasts markedly 
with Figure 6 for a = 1 where JQ's indirect effects on X3 are not 
balanced out. 

The effects of the notch connections are readily apparent in 
both cases. For example, the power associated with the notch 
frequencies are the local components to X 2 and X3 and cannot 
be attributed to outside influence. For block X\ only one of the 
five components is different from 1 reflecting the contribution 
coming fromX4. 

5.2. EMPIRICAL DATA 

This example is based on EEG data borrowed from Sameshima 
et al. (2014) (Ex. 7.7), which describes a left mesial temporal 
ictal episode monitored using an extended 10-20 system. The 
midline electrodes were excluded and left (L) and right (R) side 
electrodes were grouped as to whether they were frontal (F), cen- 
tral (C), parietal (P), temporal (T) or occipital (O) leading to the 
canonical PDCs portrayed in Figure 7 where the most important 
connecting blocks share a dominant low pass frequency canoni- 
cal component of fairly identical shape pointing to the existence 
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Canonical PDC 



K 0 



X 1 




Jt 0 

Source 





FIGURE 4 I The cPDC for Example 2 reflects the existence of the 
notch connecting filters from X| to X 2 and to X 3 . The intrinsic 
dynamics of the oscillators from a subregion of X-\ into X3 is apparent 
in the resonances of the largest cPDC component. The resonance 
within block X2 manifest itself in the single non-zero component into 



X3 while the effect of X4 reaches symmetrically into X1 and X2 via its 
single dynamic component. In this and following two figures, each 
subfigure may contain up to five cPDC/cDC components, given by 
min{Mj, Mj) as in Equations (22)/(23), represented in red, blue, yellow, 
green, or black lines in decreasing order of magnitude. 



of a shared dominant connectivity dynamics behind the observa- 
tion, see Figure 8A. Their connectivity is further summarized in 
Figure 8B. 

6. DISCUSSION 

We showed that bPDC/bDC introduced in Takahashi (2009) and 
Faes and Nollo (2013) are block coherences between properly 
chosen vector time series. When the time series are Gaussian, 
this implies that bPDC/bDC represent mutual information rates 
between well defined underlying vector time series. This fully 
generalizes the results presented in Takahashi et al. (2010). To 
enhance the understanding of the possibly complex interaction 
between multiple time series and overcome some bPDC/bDC 
limitations, we showed that the latter can be decomposed in 
canonical terms that we call cPDC/cDC. These decompositions 
represent the various different modes of interaction whereby 
sets of time series interact. We introduced an explicit way to 
compute these new quantities and proved some of their prop- 
erties. The usefulness of cPDC/cDC was illustrated by three 
examples. 

6.1. bPDC AND bDC AS BLOCK COHERENCES 

Takahashi et al. (2010) showed that PDC from the ;-th scalar time 
series to the i-th scalar time series is the coherence between the 
i-th innovation process and the j-th partialized process with a 
similar result for DC. It is natural to ask whether an analogous 



result holds for bPDC and bDC. We showed that this is indeed 
the case where bPDC/bDC represent block coherences relat- 
ing subsets of adequately defined innovations/par tialization pro- 
cesses (Takahashi, 2009; Nedungadi et al, 2011). At first sight 
these identities may seem surprising as both bPDC and bDC 
are fully multivariate and directional measures of dependence, 
whereas block coherences are at once block-pairwise and sym- 
metric measures of dependence. Yet careful reading of Theorems 
1 and 2 highlights that bPDC/bDC from j to i and bPDC/bDC 
from i to j are, in general, block coherences between distinct 
pairs of vector processes which explains their asymmetric nature 
and lends them their directed connectivity character. Also, we 
note that for both bPDC and bDC, the coherences involve 
innovation process subsets which explains their fully multivari- 
ate characteristic as measures. Another interesting observation 
is that since the innovation processes are uncorrected to the 
past of the partialized processes by construction, in the case 
of bPDC only innovations in the past of the partialized pro- 
cess contribute to the coherence which explains why bPDC is 
a directed measure of dependence. An analogous observation 
holds for bDC. In the Gaussian case, the bPDC/bDC represen- 
tation as a block coherence allows relating them to the mutual 
information rate between suitably chosen time series. Formally 
this justifies the idea that these quantities are de facto mea- 
sures of information flow. For an interesting comparison between 
bPDC/bDC and Geweke's measure of linear feedback see Faes 
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FIGURE 5 I cDC for Example 2 for a = —1 leading to a cancelation of the effect of X4 on X3 as the signal travels indirectly through two exactly 
identical structures but with opposite phases before reaching X 3 . The notch filtering action is also apparent from the cDCs from X1 to X2 and X3. 
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FIGURE 6 I cDC for Example 2 with a = 1 which differs from Figure 5 in the effect from X4 to X3 which no longer cancels out. 
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FIGURE 7 | cPDC from the Empirical Data example (section 5.2) from the left mesial ictal episode where the largest components are represented 
either in red or green. cPDC values in red were arbitrarily considered significant and were pictorially summarized in Figure 8B. 



and Nollo (2013). As a small note for the reader, we observe that 
our definition of bPDC/bDC is slightly more general than the 
one proposed by Faes and Nollo (2013) because the covariance 
matrix of the innovations does not need to be diagonal as they 
assumed. 

6.2. CANONICAL DECOMPOSITION OF DIRECTIONAL MEASURES 

Given a pair of random vectors, it is natural to ask how to mea- 
sure/represent dependence between them. In statistics, there are 
two main methods, both inspired by the basic Pearson correla- 
tion, to address this. The first one generalizes Pearson correlation 
directly using the determinants of the covariance matrix between 
and within each set of random variables. For time series, the 
equivalent measure in the frequency domain is the block coher- 
ence and the directed versions are bPDC and bDC. A second 
generalization rests on the idea of canonical correlation intro- 
duced by Hotelling (1936). There are several generalizations of 
canonical correlation for time series taylored specifically to infer 
Granger causality in the time domain (Sato et al, 2010; Wu et al, 
2011), but, to the best of our knowledge, cPDC and cDC are the 
first proposals of canonical measures of directed dependence in 
the frequency domain. 



One advantage of cPDC/cDC over bPDC/bDC is that canon- 
ical decomposition allows inferring the various different existing 
modes of interaction between sets of time series in close analogy 
to what is done for classical canonical correlation and princi- 
pal component analyses. One should expect this to be useful 
when several signals are redundant, generated by similar mech- 
anisms, or when there are several time series that do not signif- 
icantly contribute to the interaction between sets of time series, 
e.g., when there are many brain areas that are not interacting 
with each other during some specific behavior. Besides, as we 
show in Theorem 4, we can recover the bPDC/bDC from the 
cPDC/cDC. 

6.3. INTERPRETING cPDC/cDC 

The main practical interest of cPDC/cDC is to allow the simplifi- 
cation of connectivity interpretations whilst giving new insights 
into the dynamical interaction between neural structures. We 
illustrated the achievable simplification using an EEG data set 
from an epileptic patient. We also showed how cPDC is related 
to the number of "modes" of interaction between sets of time 
series through the simple numerical Example 1 and via the slightly 
more complex Example 2. We expect that cPDC/cDC together 
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FIGURE 8 | (A) This corresponds to Figure 7.13 from Sameshima et al. 
(2014) showing the gPDC connectivity graph (see arrows) and the scalp 
electrodes grouping sets corresponding to frontal (LF, RF), temporal (l_I 
RT), central (LC, RC), pariental (LR RP), and occipital (LO, RO) areas. 
The midline electrodes in gray were not considered in this analysis. 
(B) Diagram for the significant first cPDC components in Figure 7 (red 



lines) showing scalp electrode set connections shown in (A). Notice 
some divergences between gPDC and cPDC graphs possibly due the 
lack of proper rigorous statistics usage for cPDC significance level 
estimation, for instance, there is cPDC from RO to LO (B), but gPDC 
02 to 01 is absent (A), while there is gPDC from C4 to T1 without 
corresponding cPDC from RC to LT. 



with bPDC/bDC become useful tools for handling high dimen- 
sional data sets that are increasingly being recorded by several 
researchers. 

We propose that a reasonable way to understand the useful- 
ness of cPDC/cDC is to make an analogy with classical prin- 
cipal component and canonical correlation analyses. Therefore, 
similar heuristics could be applied in practical situations, for 
example, to decide the number of different components to 
include in the interpretation. The canonical time series by 
and bz from section 4 (see also Brillinger, 1981) are anal- 
ogous to the canonical variables from the classical canoni- 
cal correlation analysis and can play a similar role for result 
interpretation. 

Finally we remark that the computational procedures used for 
the present paper will be made available the PDC homepage at 
http://www.lcs.poli.usp.br/~baccala/pdc/canon together with the 
data used in section 5.2. 

ACKNOWLEDGMENTS 

CNPq Grants 307163/2013-0 to Luiz A. Baccala and 309381/2012- 
6 to Koichi Sameshima are also gratefully acknowledged 
and to NAPNA — Nucleo de Neurociencia Aplicada from the 
University of Sao Paulo. Part of this work took place dur- 
ing FAPESP Grant 2005/56464- 9 (CInAPCe). Daniel Y. 
Takahashi was partially supported by Pew Latin American 



Fellowship and Ciencia sem Fronteiras Fellowship -CNPq grant 
(246778/2012-1). 

REFERENCES 

Ashrafulla, S., Haldar, J. P., Joshi, A. A., and Leahy, R. M. (2013). Canonical 

Granger causality between regions of interest. Neuroimage 83, 189-199. doi: 

10.1016/j.neuroimage.2013.06.056 
Baccala, L. A., and Sameshima, K. (2001). Partial directed coherence: a new 

concept in neural structure determination. Biol. Cybern. 84, 463-474. doi: 

10.1007/PL00007990 
Baccala, L. A., and Sameshima, K. (2014). "Multivariate time series brain con- 
nectivity: a sum up," in Methods in Brain Connectivity Inference Through 

Multivariate Time Series Analysis, eds K. Sameshima and L. A. Baccala (Boca 

Raton: CRC Press), 245-251. doi: 10.1201/M6550-18 
Brillinger, D. R. (1981). Time Series: Data Analysis and Theory Classics in applied 

mathematics. Vol. 36. San Francisco, CA: SIAM, Society for Industrial and 

Applied Mathematics; Holden-Day. 
Faes, L., and Nollo, G. (2013). Measuring frequency domain Granger causality 

for multiple blocks of interacting time series. Biol. Cybern. 107, 217-232. doi: 

10.1007/s00422-013-0547-5 
Gelfand, I. M., and Yaglom, A. M. (1959). Calculation of amount of information 

about a random function contained in another such function. Am. Math. Soc. 

Transl. Ser. 2, 3-52. 

Hannan, E. J. (1970). Multiple Time Series (Wiley Series in Probability and 

Mathematical Statistics). New York, NY: Wiley, doi: 10.1002/9780470316429 
Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28, 

321-377. doi: 10.2307/2333955 
Kamiriski, M., and Blinowska, K. J. (1991). A new method of the description 

of the information flow in brain structures. Biol. Cybern. 65, 203-210. doi: 

10.1007/BF00198091 



Frontiers in Neuroinformatics 



www.frontiersin.org 



May 2014 | Volume 8 | Article 49 | 9 



Takahashi et al. 



Canonical information flow decomposition 



Lutkepohl, H. (1996). Handbook of Matrices. Chichester: John Wiley. 

Nedungadi, A. G., Ding, M., and Rangarajan, G. (201 1). Block coherence: a method 
for measuring the interdependence between two blocks of neurobiological time 
series. Biol Cybern. 104, 197-207. doi: 10.1007/s00422-01 1-0429-7 

Pinsker, M. S. (1964). Information and Information Stability of Random Variables 
and Processes. San Francisco, CA: Holden-Day. 

Sameshima, K., Takahashi, D. Y., and Baccala, L. A. (2014). "Asymptotic PDC prop- 
erties," in Methods in Brain Connectivity Inference through Multivariate Time 
Series Analysis, eds K. Sameshima and L. A. Baccala (Boca Raton: CRC Press), 
113-131. doi: 10.1201/M6550-9 

Sato, J. R., Fujita, A., Cardoso, E. F., Thomaz, C. E., Brammer, M. J., and 
Amaro, E. Jr. (2010). Analyzing the connectivity between regions of interest: an 
approach based on cluster Granger causality for fMRI data analysis. Neuroimage 
52, 1444-1455. doi: 10.1016/j.neuroimage.2010.05.022 

Takahashi, D. Y. (2009). Medidas de Fluxo de Informacao com Aplicacao em 
Neurociencia. Ph.D. thesis, University of Sao Paulo. Available online at: http:// 
www.teses.usp.br/teses/disponiveis/95/95131/tde-07062011-115256/en.php 

Takahashi, D. Y., Baccala, L. A., and Sameshima, K. (2010). Information theoretic 
interpretation of frequency domain connectivity measures. Biol. Cybern. 103, 
463-469. doi: 10.1007/s00422-010-0410-x 



Wu, G. R., Chen, E, Kang, D., Zhang, X., Marinazzo, D., and Chen, H. (2011). 
Multiscale causal connectivity analysis by canonical correlation: theory and 
application to epileptic brain. IEEE Trans. Biomed. Eng. 58, 3088-3096. doi: 
10.11 09/TBME.20 11.2162669 

Conflict of Interest Statement: The authors declare that the research was con- 
ducted in the absence of any commercial or financial relationships that could be 
construed as a potential conflict of interest. 

Received: 17 January 2014; accepted: 23 April 2014; published online: 30 May 2014. 
Citation: Takahashi DY, Baccala LA and Sameshima K (2014) Canonical information 
flow decomposition among neural structure subsets. Front. Neuroinform. 8:49. doi: 
1 0. 3389 /fninf. 201 4. 00049 

This article was submitted to the journal Frontiers in Neuroinformatics. 
Copyright © 2014 Takahashi, Baccala and Sameshima. This is an open-access article 
distributed under the terms of the Creative Commons Attribution License (CC BY). 
The use, distribution or reproduction in other forums is permitted, provided the 
original author (s) or licensor are credited and that the original publication in this 
journal is cited, in accordance with accepted academic practice. No use, distribution or 
reproduction is permitted which does not comply with these terms. 



Frontiers in Neuroinformatics 



www.frontiersin.org 



May 2014 | Volume 8 | Article 49 | 10 



Takahashi et al. 



Canonical information flow decomposition 



A. APPENDIX 

A.1 PROOF OF THEOREMS 1 AND 2 AND COROLLARIES 1 AND 2 

The proofs in this section follow the pattern of those in Takahashi 
et al. (2010). The chief difference lies in the care needed regarding 
the order of the products between the defining matrices. Here we 
exhibit the main proof ingredients for reader convenience, with 
further details available in Takahashi (2009) and Takahashi et al. 
(2010). 

Proof of Theorem 1 and Corollary 1. Let W = [Y T Z T ] T be a sec- 
ond order stationary process satisfying the boundedness condi- 
tion, using the following well known identity for determinants 
(Lutkepohl, 1996) 

det(SwM) = det (S zz (co) 

- Szy(oj)S^(oj)S Y z(oj)) det (Syy(co)) (Al) 

leads to 

cf z {co) = \ -det (S zz (co) 

- S ZY (co)Sy^(co)S YZ (co)) det (S'^co)) (A2) 
= 1 - det(5yy(w) 

- S YZ (co)S z ^co)S ZY (co)) det (S^M), (A3) 

under Equation (5). 
Rewrite bPDC as 

nf{a>) = 1 -det (P^(co) 

so that using the following identities proved in Takahashi et al. 
(2010) 

Pjjico) = S-.l.(co), (A5) 
S €iVj (a)) = Aij(<j))S m (te), (A6) 
and for all co e [— tt, tt) 

S m (a>) = J:u, (A7) 
back substituted into Equation (A4) leads to 

nf(a>) = 1 -det (S VjVj (co) 

- S^ico^S^ico)) det (S-J.(*>)), (A8) 

so that using Equation (A2) shows that the right-hand side of 
Equation (A8) actually is C^. (co) as we set out to prove. Corollary 
1 is immediate from Theorem 1 and Equation (7). □ 



Proof of Theorem 2 and Corollary 2. Theorem 2 is obtained by 
rewriting C^.(&>) using Equation (A3) noting that 

Sx i(j (co) =H ij {w)S m (w) (A9) 
and for all co e [—tt, tt) 

S 55 (fl)) = ©r 1 . (A10) 

Corollary 2 follows from Theorem 2 and Equation (7). □ 
A2 PROOF OF THEOREMS 3 AND 4 

Brillinger (1981, chapter 10) introduced the idea of canonical 
coherence for time series. We restate his result under our notation 
as the following theorem. 

Theorem 5 (Brillinger, Theorem 10.3.2). Let X and Y be m\ 

and m2- dimensional time-series jointly satisfying the boundedness 
condition. Form < min{mi, m{\> the following identity holds: 

d$\a>) = X m (S Y ^(co)S Y x(co)S^(co)S XY (co)) (All) 
= X m (S^(co)SxY(co)S Y ^(co)S YX (co)). (A12) 

Proof of Theorem 3. From Equations (18), (All), we have 

<j$(eo) = X m (S-;.(«)S I); . ei («)S^.(«)S €i , ; («)). (A13) 

Now, from Equations (A5), (A6), and (A7) it follows that 

S-^(w)S Vi (co)S-^{co)S emj (w) = A?.(a»)Eg ^(a))^ 1 ^), 

(A14) 

which proves Equation (20). 

To prove Equation (21), we use Equations (19), (A12) to obtain 

C^(co) = X m {S^ x .{w)S XiKj {co)S-^{w)S ii x i {co)). (A15) 

Finally, from Equations (A9), (A10), we have 

(A16) 

which concludes the proof. □ 
Proof of Theorem 4. Rewrite bPDC as 

\-nf\co) = deta-A^MSr'AyMPr 1 ^)). (A17) 

Now, Equation (22) is a straightforward consequence of the rela- 
tionship between eigenvalues and the determinant of a matrix. A 
similar argument proves Equation (23). □ 
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